AITopics

2510.16069

Country: Asia > Singapore (0.27)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Setting > Higher Education (1.00)
Education > Assessment & Standards (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.68)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.86)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Kortemeyer, Gerd, Caspar, Alexander, Horica, Daria

Artificial-Intelligence Grading Assistance for Handwritten Components of a Calculus Exam

arXiv.org Artificial IntelligenceNov-14-2025

We investigate whether contemporary multimodal LLMs can assist with grading open-ended calculus at scale without eroding validity. In a large first-year exam, students' handwritten work was graded by GPT-5 against the same rubric used by teaching assistants (TAs), with fractional credit permitted; TA rubric decisions served as ground truth. We calibrated a human-in-the-loop filter that combines a partial-credit threshold with an Item Response Theory (2PL) risk measure based on the deviation between the AI score and the model-expected score for each student-item. Unfiltered AI-TA agreement was moderate, adequate for low-stakes feedback but not for high-stakes use. Confidence filtering made the workload-quality trade-off explicit: under stricter settings, AI delivered human-level accuracy, but also left roughly 70% of the items to be graded by humans. Psychometric patterns were constrained by low stakes on the open-ended portion, a small set of rubric checkpoints, and occasional misalignment between designated answer regions and where work appeared. Practical adjustments such as slightly higher weight and protected time, a few rubric-visible substeps, stronger spatial anchoring should raise ceiling performance. Overall, calibrated confidence and conservative routing enable AI to reliably handle a sizable subset of routine cases while reserving expert judgment for ambiguous or pedagogically rich responses.

large language model, machine learning, natural language, (18 more...)

2510.05162

Country:

Europe (0.46)
North America > United States (0.46)

Genre:

Instructional Material (0.68)
Research Report (0.50)

Industry:

Education > Assessment & Standards (0.93)
Education > Curriculum > Subject-Specific Education (0.70)
Education > Educational Setting (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-17-2025

A Review of Generative AI in Computer Science Education: Challenges and Opportunities in Accuracy, Authenticity, and Assessment

Reihanian, Iman, Hou, Yunfei, Chen, Yu, Zheng, Yifei

This paper surveys the use of Generative AI tools, such as ChatGPT and Claude, in computer science education, focusing on key aspects of accuracy, authenticity, and assessment. Through a literature review, we highlight both the challenges and opportunities these AI tools present. While Generative AI improves efficiency and supports creative student work, it raises concerns such as AI hallucinations, error propagation, bias, and blurred lines between AI-assisted and student-authored content. Human oversight is crucial for addressing these concerns. Existing literature recommends adopting hybrid assessment models that combine AI with human evaluation, developing bias detection frameworks, and promoting AI literacy for both students and educators. Our findings suggest that the successful integration of AI requires a balanced approach, considering ethical, pedagogical, and technical factors. Future research may explore enhancing AI accuracy, preserving academic integrity, and developing adaptive models that balance creativity with precision.

large language model, machine learning, natural language, (17 more...)

2507.11543

Country: North America > United States > California (0.14)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (0.90)
Education > Assessment & Standards (0.68)
Education > Educational Technology > Educational Software (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Yang, Yoonseok, Kim, Minjune, Rondinelli, Marlon, Shao, Keren

Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading

arXiv.org Artificial IntelligenceJul-8-2025

Grading handwritten, open-ended responses remains a major bottleneck in large university STEM courses. We introduce Pensieve (https://www.pensieve.co), an AI-assisted grading platform that leverages large language models (LLMs) to transcribe and evaluate student work, providing instructors with rubric-aligned scores, transcriptions, and confidence ratings. Unlike prior tools that focus narrowly on specific tasks like transcription or rubric generation, Pensieve supports the entire grading pipeline-from scanned student submissions to final feedback-within a human-in-the-loop interface. Pensieve has been deployed in real-world courses at over 20 institutions and has graded more than 300,000 student responses. We present system details and empirical results across four core STEM disciplines: Computer Science, Mathematics, Physics, and Chemistry. Our findings show that Pensieve reduces grading time by an average of 65%, while maintaining a 95.4% agreement rate with instructor-assigned grades for high-confidence predictions.

large language model, machine learning, natural language, (17 more...)

2507.01431

Genre: Research Report > New Finding (0.87)

Industry: Education > Curriculum > Subject-Specific Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Yeadon, Will, Peach, Alex, Testrow, Craig P.

A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course

arXiv.org Artificial IntelligenceMar-25-2024

This study evaluates the performance of ChatGPT variants, GPT-3.5 and GPT-4, both with and without prompt engineering, against solely student work and a mixed category containing both student and GPT-4 contributions in university-level physics coding assignments using the Python language. Comparing 50 student submissions to 50 AI-generated submissions across different categories, and marked blindly by three independent markers, we amassed $n = 300$ data points. Students averaged 91.9% (SE:0.4), surpassing the highest performing AI submission category, GPT-4 with prompt engineering, which scored 81.1% (SE:0.8) - a statistically significant difference (p = $2.482 \times 10^{-10}$). Prompt engineering significantly improved scores for both GPT-4 (p = $1.661 \times 10^{-4}$) and GPT-3.5 (p = $4.967 \times 10^{-9}$). Additionally, the blinded markers were tasked with guessing the authorship of the submissions on a four-point Likert scale from `Definitely AI' to `Definitely Human'. They accurately identified the authorship, with 92.1% of the work categorized as 'Definitely Human' being human-authored. Simplifying this to a binary `AI' or `Human' categorization resulted in an average accuracy rate of 85.3%. These findings suggest that while AI-generated work closely approaches the quality of university students' work, it often remains detectable by human evaluators.

assignment, prompt engineering, submission, (13 more...)

2403.16977

Country: Europe > United Kingdom (0.04)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (0.84)

Industry: Education > Educational Setting > Higher Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Henkel, Owen, Hills, Libby

Leveraging Human Feedback to Scale Educational Datasets: Combining Crowdworkers and Comparative Judgement

arXiv.org Artificial IntelligenceNov-9-2023

Machine Learning models have many potentially beneficial applications in education settings, but a key barrier to their development is securing enough data to train these models. Labelling educational data has traditionally relied on highly skilled raters using complex, multi-class rubrics, making the process expensive and difficult to scale. An alternative, more scalable approach could be to use non-expert crowdworkers to evaluate student work, however, maintaining sufficiently high levels of accuracy and inter-rater reliability when using non-expert workers is challenging. This paper reports on two experiments investigating using non-expert crowdworkers and comparative judgement to evaluate complex student data. Crowdworkers were hired to evaluate student responses to open-ended reading comprehension questions. Crowdworkers were randomly assigned to one of two conditions: the control, where they were asked to decide whether answers were correct or incorrect (i.e., a categorical judgement), or the treatment, where they were shown the same question and answers, but were instead asked to decide which of two candidate answers was more correct (i.e., a comparative/preference-based judgement). We found that using comparative judgement substantially improved inter-rater reliability on both tasks. These results are in-line with well-established literature on the benefits of comparative judgement in the field of educational assessment, as well as with recent trends in artificial intelligence research, where comparative judgement is becoming the preferred method for providing human feedback on model outputs when working with non-expert crowdworkers. However, to our knowledge, these results are novel and important in demonstrating the beneficial effects of using the combination of comparative judgement and crowdworkers to evaluate educational data.

judgement, rater, reliability, (17 more...)

2305.12894

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Armenia (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.67)
Research Report > Strength High (0.54)

Industry: Education > Assessment & Standards > Student Performance (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

#artificialintelligenceMar-26-2023, 23:10:39 GMT

How ChatGPT Can Help with Grading • TechNotes Blog

I enjoy teaching, but I don't enjoy grading. Using rubrics makes grading easier, but it can still be a chore to review each assignment with fresh eyes so that the student you are grading now gets the same attention as the first few. This is where ChatGPT can come in handy. It doesn't get tired of grading. And if you have a tight rubric (well-designed with little or no loopholes), you can expect consistent results from ChatGPT, but there are a few important things to consider.

chatgpt, rubric, student, (11 more...)

Country: North America > United States > Texas (0.05)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceMar-9-2023, 17:23:31 GMT

AI arrives on college campuses: How students are using ChatGPT for essays, research and more

Ready or not, the AI revolution is upon us and one of its most immediate impacts is the emergence of chatbots like ChatGPT. "It will be a boon to the societies that pick this up," said junior student leader and president of the Metropolitan State University of Denver Chess Club Paul Nelson. Nelson is talking about ChatGPT and its rapid emergence on college campuses throughout the U.S. One educated at MSU Denver said the first time he heard of the chatbot was in November and now, four months later, it's a part of almost every conversation he has. "My first reaction when I first saw ChatGPT was, 'Oh my God. We are in trouble,'" said Dr. David Merriam, assistant professor of biology.

chatgpt, msu denver, nelson, (11 more...)

Country: North America > United States (0.15)

Industry:

Education (0.73)
Leisure & Entertainment > Games > Chess (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceJan-22-2023, 17:35:34 GMT

UAB cybersecurity program ranked No. 1 - Yellowhammer News

Fortune ranked the University of Alabama at Birmingham's in-person master's degree in cybersecurity as the No. 1 program in the country. According to Fortune, there are nearly 770,000 cybersecurity job openings in the United States. "We are proud to be recognized for academic excellence by Fortune and named the nation's leading institution for graduate studies in cybersecurity," said UAB Provost and Senior Vice President for Academic Affairs Pam Benoit. "UAB's Department of Computer Science has created an outstanding collaborative master's degree program that prepares students to lead careers solving the world's most challenging cybersecurity problems." Fortune's first-ever ranking of in-person cybersecurity master's degree programs compared 14 programs across the United States in three components: Selectivity Score, Success Score and Demand Score.

artificial intelligence, cybersecurity, machine learning, (16 more...)

Country: North America > United States > Alabama (0.26)

Genre: Instructional Material > Course Syllabus & Notes (0.30)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military > Cyberwarfare (1.00)
Education > Educational Setting > Higher Education (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.31)

#artificialintelligenceJul-21-2022, 07:40:55 GMT

Various Roles of AI (Artificial Intelligence) in Education

The role of AI in education is to provide personalized learning experiences for students and to assist educators in the classroom. AI can provide students with individualized feedback and recommendations based on their learning progress. AI can also help educators to identify areas where students may need extra support. Thus, in this blog post, I shall highlight the roles, AI can play in teaching, learning, and assessment. AI for Teaching Let us see, what role AI can play in teaching to improve the learning outcome. The role of AI in teaching is to provide educators with tools and resources that...

application, artificial intelligence, student, (11 more...)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.38)